153 research outputs found
Faster Random Walks By Rewiring Online Social Networks On-The-Fly
Many online social networks feature restrictive web interfaces which only
allow the query of a user's local neighborhood through the interface. To enable
analytics over such an online social network through its restrictive web
interface, many recent efforts reuse the existing Markov Chain Monte Carlo
methods such as random walks to sample the social network and support analytics
based on the samples. The problem with such an approach, however, is the large
amount of queries often required (i.e., a long "mixing time") for a random walk
to reach a desired (stationary) sampling distribution.
In this paper, we consider a novel problem of enabling a faster random walk
over online social networks by "rewiring" the social network on-the-fly.
Specifically, we develop Modified TOpology (MTO)-Sampler which, by using only
information exposed by the restrictive web interface, constructs a "virtual"
overlay topology of the social network while performing a random walk, and
ensures that the random walk follows the modified overlay topology rather than
the original one. We show that MTO-Sampler not only provably enhances the
efficiency of sampling, but also achieves significant savings on query cost
over real-world online social networks such as Google Plus, Epinion etc.Comment: 15 pages, 14 figure, technical report for ICDE2013 paper. Appendix
has all the theorems' proofs; ICDE'201
GraphSR: A Data Augmentation Algorithm for Imbalanced Node Classification
Graph neural networks (GNNs) have achieved great success in node
classification tasks. However, existing GNNs naturally bias towards the
majority classes with more labelled data and ignore those minority classes with
relatively few labelled ones. The traditional techniques often resort
over-sampling methods, but they may cause overfitting problem. More recently,
some works propose to synthesize additional nodes for minority classes from the
labelled nodes, however, there is no any guarantee if those generated nodes
really stand for the corresponding minority classes. In fact, improperly
synthesized nodes may result in insufficient generalization of the algorithm.
To resolve the problem, in this paper we seek to automatically augment the
minority classes from the massive unlabelled nodes of the graph. Specifically,
we propose \textit{GraphSR}, a novel self-training strategy to augment the
minority classes with significant diversity of unlabelled nodes, which is based
on a Similarity-based selection module and a Reinforcement Learning(RL)
selection module. The first module finds a subset of unlabelled nodes which are
most similar to those labelled minority nodes, and the second one further
determines the representative and reliable nodes from the subset via RL
technique. Furthermore, the RL-based module can adaptively determine the
sampling scale according to current training data. This strategy is general and
can be easily combined with different GNNs models. Our experiments demonstrate
the proposed approach outperforms the state-of-the-art baselines on various
class-imbalanced datasets.Comment: Accepted by AAAI202
Vehicle trajectory clustering based on dynamic representation learning of internet of vehicles
With the widely used Internet of Things, 5G, and smart city technologies, we are able to acquire a variety of vehicle trajectory data. These trajectory data are of great significance which can be used to extract relevant information in order to, for instance, calculate the optimal path from one position to another, detect abnormal behavior, monitor the traffic flow in a city, and predict the next position of an object. One of the key technology is to cluster vehicle trajectory. However, existing methods mainly rely on manually designed metrics which may lead to biased results. Meanwhile, the large scale of vehicle trajectory data has become a challenge because calculating these manually designed metrics will cost more time and space. To address these challenges, we propose to employ network representation learning to achieve accurate vehicle trajectory clustering. Specifically, we first construct the k-nearest neighbor-based internet of vehicles in a dynamic manner. Then we learn the low-dimensional representations of vehicles by performing dynamic network representation learning on the constructed network. Finally, using the learned vehicle vectors, vehicle trajectories are clustered with machine learning methods. Experimental results on the real-word dataset show that our method achieves the best performance compared against baseline methods. © 2000-2011 IEEE. **Please note that there are multiple authors for this article therefore only the name of the first 5 including Federation University Australia affiliate “Feng Xia” is provided in this record*
Not Every Couple Is a Pair: A Supervised Approach for Lifetime Collaborator Identification
While scientific collaboration can be critical for a scholar, some collaborator(s) can be more significant than others, a.k.a. lifetime collaborator(s). This work-in-progress aims to investigate whether it is possible to predict/identify lifetime collaborators given a junior scholar\u27s early profile. For this purpose, we propose a supervised approach by leveraging scholars\u27 local and network properties. Extensive experiments on DBLP digital library demonstrate that lifetime collaborators can be accurately predicted. The proposed model outperforms baseline models with various predictors. Our study may shed light on the exploration of scientific collaborations from the perspective of life-long collaboration
AIDA: Legal Judgment Predictions for Non-Professional Fact Descriptions via Partial-and-Imbalanced Domain Adaptation
In this paper, we study the problem of legal domain adaptation problem from
an imbalanced source domain to a partial target domain. The task aims to
improve legal judgment predictions for non-professional fact descriptions. We
formulate this task as a partial-and-imbalanced domain adaptation problem.
Though deep domain adaptation has achieved cutting-edge performance in many
unsupervised domain adaptation tasks. However, due to the negative transfer of
samples in non-shared classes, it is hard for current domain adaptation model
to solve the partial-and-imbalanced transfer problem. In this work, we explore
large-scale non-shared but related classes data in the source domain with a
hierarchy weighting adaptation to tackle this limitation. We propose to embed a
novel pArtial Imbalanced Domain Adaptation technique (AIDA) in the deep
learning model, which can jointly borrow sibling knowledge from non-shared
classes to shared classes in the source domain and further transfer the shared
classes knowledge from the source domain to the target domain. Experimental
results show that our model outperforms the state-of-the-art algorithms.Comment: 13 pages, 15 figure
- …